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Abstract. Driven by the need for better models that allow one to shed light 
into the question how life's diversity has evolved, phylogenetic networks have 
now joined phylogenetic trees in the center of phylogcnetics research. Like 
phylogenetic trees, such networks canonically induce collections of phyloge- 
netic trees, clusters, and triplets, respectively. Thus it is not surprising that 
many network approaches aim to reconstruct a phylogenetic network from such 
collections. Related to the well-studied perfect phylogeny problem, the follow- 
ing question is of fundamental importance in this context: When does one of 
the above collections encode (i.e. uniquely describe) the network that induces 



In this note, we present a complete answer to this question for the special 
case of a level- 1 (phylogenetic) network by characterizing those level- 1 net- 
works for which an encoding in terms of one (or equivalcntly all) of the above 
collections exists. Given that this type of network forms the first layer of the 
rich hierarchy of level-fc networks, k a non-negative integer, it is natural to 
wonder whether our arguments could be extended to members of that hierar- 
chy for higher values for k. By giving examples, we show that this is not the 
case. 

Keywords: Phylogeny, phylogenetic networks, triplets, clusters, supernet- 
work, level-1 network, perfect phylogeny problem. 



An improved understanding of the complex processes that drive evolution has 
lent support to the idea that reticulate evolutionary events such as lateral gene 
transfer or hybridization are more common than originally thought rendering a 
phylogenetic tree (essentially a rooted leaf labelled graph-theoretical tree) too sim- 
plistic a model to fully understand the complex processes that drive evolution. 
Reflecting this, phylogenetic networks have now joined phylogenetic trees in the 
center of phylogenetics research. Influenced by the diversity of questions posed by 
evolutionary biologists that can be addressed with a phylogenetic networks, var- 
ious alternative definitions of these types of networks have been developed over 
the years [HB06]. These include split networks [BM04, BFSR95, HHML04] as 
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well as ancestral recombination graphs [SH05] , TOM networks [Wil06] , level-fc net- 
works 1 with k a non-negative integer that in a some sense captures how complex 
the network structure is, networks for studying the evolution of polyploid organisms 
[MH06], tree-child and tree-sibling networks [CLRV08], to name just a few. 

Apart from split networks which aim to give an implicit model of evolution and 
are not the focus of this note, all other phylogenetic networks mentioned above aim 
to provide an explicit model of evolution. Although slightly different in detail, they 
are all based on the concept of a leaf-labelled rooted connected directed acyclic 
graph (see the next section for a definition). For the convenience of the reader, we 
depict an example of a phylogenetic network in the form of a level- 1 network in 
Fig. 1(a). Concerning these types of phylogenetic networks, it should be noted that 



FIGURE 1. (a) A level- 1 phylogenetic network N. (b) and (c) The 
phylogenetic trees that form the tree system T(N). 

they are closely related to galled trees [WZZ01, GEL03] and that, in addition to 
constituting the first layer of the rich hierarchy of level-fc networks, they also form 
a subclass of the large class of tree-sibling networks [AVP08]. 

Due to the rich combinatorial structure of phylogenetic networks, different com- 
binatorial objects have been used to reconstruct them from biological data. For 
a set X of taxa (e.g. species or organisms), these include cluster systems of X, 
that is, collections of subsets of X [BD89, HR08], triplet systems on X, that is, 
collections of phylogenetic trees with just three leaves which are generally called 
(rooted) triplets [JS04, TH09], and tree systems, that is, collections of phylogenetic 
trees which all have leaf set X [Sem07]. The underlying rational being that any 
phylogenetic network TV induces a cluster system C(N), a triplet system TZ(N) and 
a tree system T(N). Again we defer the precise definitions to later sections of 
this note and remark that for the level- 1 network N with leaf set X — {a, b . . . , e} 
depicted in Fig. 1(a), the cluster system C(N) consists of X, the five singleton sets 
of X, and the subsets {a, b}, {c, c?}, {b, c, c?}, Y := {a, 5, c, d}, and the tree system 
T(N) consists of the phylogentic trees depicted in Fig. 1(b) and (c), respectively. 
Denoting a phylogenetic tree t on x, y, z such that the root of t is not the parent 
vertex of x and y by z\xy (or equivalently by xy\z) then the triplet system TZ(N) 
consists of all triplets of the from e\xy where x,y £F distinct, x\cd with x £ {a, b}, 
and x\ab and a\bx with x 6 {c, d}. 

Although undoubtedly highly relevant for phylogenetic network reconstruction, 
the following fundamental question has however remained largely unanswered so 

^Note that these networks were originally introduced in [JS04], but the definition commonly 
used now is slightly different with the main difference being that every vertex of the network with 
indegree 2 must have outdegree 1 (see e.g. [vIKK+08] and the references therein). 
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far (the main exception being the case when TV is in fact a phylogenetic tree in 
which case this question is closely related to the well-studied perfect phylogeny 
problem - see e.g. [GH07] for a recent overview.): When do the systems C(N), 
1Z(N), or T(N) induced by a phylogenetic network TV encode TV, that is, there is 
no other phylogenetic network TV' for which the corresponding systems for TV and 
TV' coincide? 

Complementing the insights for when TV is a phylogenetic tree alluded to above, 
answers were recently provided for 1Z(N) in case TV is a very special type of level- A; 
network, k > 2, [vIKM09] and for T(N) for the special case that TV is a regular 
network [Wil09]. Undoubtedly important first results, there are many types of 
phylogenetic networks which arc encoded by the tree system they induce but which 
are not regular or by the triplet system they induce but do not belong to that 
special class of level- 2 networks. An example for both cases is the level- 1 network 
depicted in Fig. 1(a). Although one might be tempted to speculate that all level- 1 
networks enjoy this property, this is not the case since the level- 1 networks depicted 
in Fig. 1(a) and Fig. 2(b), respectively, induce the same tree system and the same 
triplet system. The main result of this paper shows that these observations are not 
a coincidence. More precisely, in Theorem 1 we establish that a level- 1 network TV 
is encoded by the triplet system 1Z(N) (or equivalently by the tree system T(TV) 
or equivalently the cluster system S(N) = S(T(N)) := UtgT(jv) ^CH which arises 
in the context of the softwired interpretation of TV [HR08] and contains C(N)) if 
and only if, when ignoring directions, TV does not contain a cycle of length 4. 
Consequently the number of non-isomorphic (see below) phylogenetic networks TV' 
which all induce the same tree system (or equivalently the same triplet system or 
the same cluster system S (TV)) grows exponentially in the number of cycles of TV of 
length 4. It is of course highly tempting to speculate that a similar characterization 
might hold for higher values of k. However as our examples show, establishing such 
a result will require an alternative approach since our arguments cannot be extended 
to level- 2 networks and thus to level- A; networks with k > 2. 

This note is organized as follows. In the next section, we present the definition 
of a level-1 network plus surrounding terminology. In Section 3, we present the 
definitions of the cluster system C(TV) and the tree system T(TV) induced by a phy- 
logenetic network TV. This also completes the definition of the cluster system 6> (TV) 
given in the introduction. Subsequent to this, we show that for any level-1 network 
TV, the cluster systems S(N) and C(N) are weak hierarchies (Proposition 1) which 
are well-known in cluster analysis. In addition, we show that this property is not 
enjoyed by level-2 networks and thus level- A; networks with k > 2. In Section 4, we 
first present the definition of the triplet system TZ(N) induced by phylogenetic net- 
work TV. Subsequent to this, we turn our attention to the special case of encodings 
of simple level-1 networks. In Section 5, we present our main result (Theorem 1). 

To ease the presentation of our results, in all figures the (unique) root of a 
network is the top vertex and all arcs are directed downwards, away from the root. 
Furthermore, for any directed graph G, we denote the vertex set of G by V(G) and 
the set of arcs of G by A(G). 
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2. Basic terminology and results concerning level-1 networks 

In this section we present the definitions of a phylogenetic network and of a level- 
k network, k > 0. In addition we also provide the basic terminology surrounding 
these structures. 

Suppose X is a finite set. A phylogenetic network N on X is a rooted directed 
acyclic graph (DAG) that satisfies the following additional properties, (i) The set 
L(N) of leaves of N, that is vertices with indegee 1 and outdegree 0, is A. (ii) 
Exactly one vertex of N, called the root and denoted by p^, has indegree and 
outdegree 2. (iii) All vertices of N that are not contained in L(N) U{pn} are either 
split vertices, that is, have indegree 1 and outdegree 2 or reticulation vertices, that 
is, have indegree 2 and outdegree 1. The set of reticulation vertices of N is denoted 
by R(N). A phylogenetic network N for which R(N) is empty is called a (rooted) 
phylogenetic tree (on X). Two phylogenetic networks N and N' which both have 
leaf set X are said to be isomorphic if there exists a bijection from V(N) to V(N') 
which is the identity on X and induces a graph isomorphism between N and N' . 

To present the definition of a level-fc network, we need to introduce some termi- 
nology concerning rooted DAGs first. Suppose G is a rooted connected DAG with 
at least 2 vertices. Then we denote the graph obtained from G by ignoring the 
directions on G by U (G) . If H is a graph with at least 2 vertices then we call H 
biconnected if H does not contain a vertex whose removal disconnects it. A bicon- 
nected component of H is a maximal subgraph of H that is biconnected. If G is a 
phylogenetic network and B is a rooted sub-DAG such that U (B) is a biconnected 
component of U(G) then we call B a blob. 

Following [vIKK + 08], we call a phylogenetic network N a level-k network for 
some non-negative integer k if each blob of N contains at most k reticulation ver- 
tices. Note that some authors define a level-1 network N to be a phylogenetic 
network without the above outdegree requirement on the reticulation vertices of 
N (see e.g. [JS04]). Also and sometimes on its own or in addition to the above, 
the requirement that each blob contains at most k reticulation vertices is some- 
times replaced by the requirement that the cycles in U{N) are node disjoint (see 
e.g. [JS04, JS06]). Although in spirit the same definitions, the difference is that a 
cycle is generally understood to have at least three vertices which implies that the 
network depicted in Fig 2(a) would not be a level-1 network. However the definition 
presented in [vIKK + 08] would render that network a level-1 network. Having said 
that, the network N depicted in Fig. 2(b) is a less parsimonious representation of 
the same biological information (expressed in terms of the systems T(N), 1Z(N), 
C(N), and S(N)) as the level-1 network in Fig. 2(a) in the sense that the edges in 
grey are redundant for displaying that information. To avoid these types of level-1 
networks which cannot be encoded by any of the 4 systems of interest in this note, 
we follow [vIKM09] and require that every blob in a level- 1 network contains at 
least 4 vertices. 

For k = 1,2, it was shown in [vIKK + 08] (see also [JS06] for the case k = 1) 
that level-A; networks can be built up by chaining together structurally very simple 
level-A: networks called simple level-k networks. Defined for general non-negative 
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(a) (b) 




Figure 2. The level- 1 network N depicted in (a) induces and thus 
represents the same triplet system 1Z(N), cluster systems C(N) and 
S(N), and tree system T(N) as the level-1 network N' presented 
in (b). However, N' is a less parsimonious representation of these 
4 systems. 

integers k, these atomic building blocks are precisely those level- fc networks that 
can be obtained from a level-k generator by applying a certain "leaf hanging" 
operation [vIKK + 08] to its "sides". Such a generator is a biconnected directed 
acyclic multi- graph which has a single root, precisely k pseudo-reticulation vertices 
(i. e. vertices with indegree 2 and outdegree at most 1) and all other vertices are 
split vertices where the root and a split vertex are defined as in the case of a 
phylogenetic network. For the convenience of the reader, we present in Fig 3 the 
unique level- 1 generator and all 4 level-2 generators which originally appeared in 
slightly different form in [vIKK + 08]. Regarding larger values for k, it was recently 

0AAI£ 

Figure 3. The unique level-1 generator Q 1 , and the four level-2 
generators: Gl,Gl, Q 2 C and Q\. 

shown in [Kel08] that there exist 65 level-3 generators. In addition, it was shown 
in [GBP09] that there are 1993 level-4 generators and that the number of level-fc 
generators grows exponentially in k. A side of a generator G is an arc of G or one 
of its pseudo-reticulation vertices. 

From now on and unless stated otherwise, all phylogenetic networks have leaf 
set X. 



3. The Systems C(N), T(N), and S{N) 

In this section, we introduce for a phylogenetic network N the associated systems 
C(N), T(N), and S(N) already mentioned in the introduction. In addition, we 
prove that in case AT is a level-1 network the associated systems C(N) and S(N) 
are weak hierarchies. We conclude with presenting an example that shows that 
level-fc networks, k > 2, do not enjoy this property in general. We start with some 
definitions. 
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Suppose N is a phylogenetic network. Then we say that a vertex a G V(N) is 
below a vertex b G V(N) denoted by a <n b, if there exists a path Pb a (possibly of 
length 0) from b to a. In this case, we also say that b is above a. Every vertex v G 
V(N) therefore induces a non-empty subset C(v) — Cn(v) of X which comprises 
of all leaves of TV below v (see e.g. [SS03]). We collect the subsets C(v) induced by 
the vertices v of N this way in the set C(N), i.e. we put C(N) = \J veV ^ N ^{C(v)}. 
For convenience, we refer to any collection C of non-empty subsets of X as a cluster 
system (on X) and to the elements of C as clusters of X. It should be noted that 
in case N is a binary phylogenetic tree, the cluster system C(N) is a hierarchy (on 
X J, that is, for any two clusters Ci,C 2 G C(N) we have that C± nC 2 G {0, C±, C 2 }. 
Hierarchies are sometimes also called laminar families, and it is well-known that the 
set of clusters C(T) induced by a binary phylogenetic tree T uniquely determines 
that tree (see e.g. [SS03]). 

In the context of phylogenetic network construction, the concept of a weak hi- 
erarchy (on X) was introduced in [BD89]. These objects are defined as follows. 
Suppose C is a cluster system on X. Then C is called a weak hierarchy (on X ) if 

(i) c 1 nc 2 nc 3 G{c 1 nc 2 ,c 2 nC3,c 1 nc 3 } 

holds for any three elements Ci,C 2 ,C3 G C. Note that the above property is 
sometimes also called the weak Helly property [SS03]. Also note that any hierar- 
chy is in particular a weak hierarchy and that any subset of a weak hierarchy is 
again a weak hierarchy. Finally note that weak hierarchies are well-known objects 
in classical hypergraph and abstract convexity theories [BD89] (see also the refer- 
ence therein and [BBO04]), and that they where originally introduced into cluster 
analysis as medinclus in [Bat88, Bat89]. 

We will establish the main result of this section (Proposition 1) by showing that 
the cluster system S(N) associated to a levcl-1 network N is a weak hierarchy. 
To do this, we first need to complete the definition of S(N) which relies on the 
definition of the system T{N). We will do this next. 

Suppose TV is a phylogenetic network. Then wc say that a phylogenetic tree T is 
displayed by N if the leaf set of T is X and T is a phylogenetic tree obtained from N 
via the following process. For each reticulation vertex of N delete one incoming arc 
and suppress any resulting degree 2 vertices. In case the root pn of N is rendered 
a vertex with out-degree 1 this way, we identify pn with its unique child. The set 
T(N) is the collection of all phylogenetic trees that are displayed by N. To every 
vertex v G V(N) a cluster system Sjy(v) defined by putting 

S N (v) = {C T (v) : T G T(N)} 

can be associated. Clearly, CV(u) G Sn{v) and S(N) — \J v&V (n) <^n{v)- 

To link clusters of X with level- 1 networks on X, we say that a cluster C on X 
is tree- consistent with a level-1 network N if C G S(N). More generally, we say 
that a cluster system C is tree- consistent with a level-1 network N if C C S(N) 
holds. Thus, for any level-1 network N the cluster system S(N) equals the set of 
all clusters of X that are tree-consistent with N. Finally, we say that a cluster 
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system C is level-1- consistent if there exists a level-1 network N such that C is 
tree-consistent with N. 

We next establish Proposition 1. Its proof relies on a characterization of a weak 
hierarchy H on X from [BD89, Lemma 1] in terms of a property of a certain H- 
closure that can be canonically associated to 7i. More precisely, suppose l^yci 
and H is a cluster system on X. Then the H-closure (Y)h of Y is the intersection 
Hycc ceH ^ ' Now a cluster system H on J is a weak hierarchy if and only 
if for every non-empty subset A C X there exists elements a, a' £ ^4 such that 
(A)^ = ({a, a'})-ft. Note that this implies in particular that the number of elements 
in a weak hierarchy is at most (' J><: 2 +1 ) [BD89]. With regards to this bound it 
should be noted that it was recently shown in [KNTX08] that the size of a cluster- 
system which is tree-consistent with a level-1 network N is linear in |X|. In view 
of Proposition 1, this bound improves on the previous bound for this special kind 
of weak hierarchy. 

Proposition 1. A level-1- consistent cluster system is a weak hierarchy. In par- 
ticular, the systems S(N) and C(N) associated to a level-1 network N are weak 
hierarchies. 

Proof: Since every subset of a weak hierarchy is again a weak hierarchy, it suffices 
to show that for every level-1 network N the associated cluster system S(N) is a 
weak hierarchy. To see this suppose N is a level-1 network on X = {x\, . . . , x n }, 
n > 1. Consider a graphical representation of N and, starting from the left most 
leaf of N in that representation, let x\ . . . x n denote the induced ordering of the 
leaves of N (note that this might involve re-labelling some of the elements in X). 
Suppose ^ A C X. Let i,j G {1, . . . ,n} be such that Xj G A and every leaf in 
X succeeding Xj in that ordering is not contained in A. Similarly, let Xi G A be 
such that every leaf in X preceding Xi in that ordering is not contained in A. We 
claim that {A) gem = (fai, x j})s{N)- To see this, note that since N is a level-1 
network, there exists a subtree T of N such that the leaf set of T is A. Note that 
T might contain vertices whose indegree and outdegree is one. By deleting for each 
reticulation vertex below the root of T one of its incommming arcs and supress- 
ing the resulting degree 2 vertex T can be canonically extended to a subtree T" of 
some tree T" G T(N) such that {xi,Xi+i, . . . ,Xj} C L(T') and L(T') is minimal 
with regards to set inclusion. Note that L(T') G S(N). But then, by construction, 
(A)s(N) = L(T') — ({xi,Xj}}s(N) which proves the claim. I 

We remark in passing that to any cluster system C of X a similarity measure 
D c : X x X — > R can be associated to C by putting Dc(a, b) = \{C G C : a, b G C}\, 
a,b G X. Proposition 1 combined with the main result from [BD89] implies that any 
tree-consistent cluster system C can be uniquely reconstructed from its associated 
similarity measure Dc- Using the well-known Farris transform (see c. g. [SS03], and 
[DHM07] for a recent overview) a similarity measure can be canonically transformed 
into a distance measure D c on X, that is, a map on X x X into the non-negative 
reals that is symmetric, satisfies the triangle inequality, and vanishes on the main 
diagonal. The latter measures were recently investigated in [CJLY05] from an 
algorithmical point of view in the context of representing them in terms of an 
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ultra-metric level- 1 network. These are generalizations of ultrametric phylogenetic 
trees in the sense that every path from the root of the network to any leaf is of the 
same length. 

We conclude this section with remarking that as the example of the levcl-2 
network TV presented in Fig. 4(a) shows, the result analogous to Proposition 1 
does not hold for level-2 networks since {{a, b, c}, {a, b, d}, {b, c, d}} C S(N) but 
{a, 6, c} (~l {a, b, d} n {b, c, d} = {b}. Furthermore, we remark that as the example of 




(a) (b) (c) 

Figure 4. (a) A level-2 network TV for which S(N) is not a weak 
hierarchy. The phylogenetic network TV depicted in (b) does not 
display the phylogenetic tree T depicted in (c) but C(T) is tree- 
consistent with TV. 

the level-2 network TV depicted in Fig. 4(b) combined with the cluster system C(T) 
induced by the phylogenetic tree T depicted in Fig. 4(c) shows, a cluster system 
C(T) induced by a phylogenetic tree T can be contained in the cluster system S(N) 
of a level-2 network TV and TV need not display T. 

4. Simple level-1 Networks 

In this section we turn our attention to studying simple level-1 networks. In 
particular, we establish a fundamental property of these networks with regards to 
encodings of level-1 networks. To do this, we require some more definitions. We 
start with the definition of the triplet system 72(TV) induced by a phylogenetic 
network TV. 

Suppose iV is a phylogenetic network. If Y C X is a subset of X of size 3, then 
TV induces a triplet t on X by taking t to be a minimal subtree of TV with leaf set 
Y and suppressing resulting degree two vertices of t. The set of triplets induced 
on X by TV this way is the triplet system 72(TV). Two properties of this triplet 
systems should be noted. First, every triplet in 1Z(N) is consistent with TV, where 
a triplet x\yz is called consistent with a phylogenetic network TV if x, y, z 6 X and 
there exist two vertices u,v G V(N) and pairwise internally vertex-disjoint paths 
in TV from u to y, u to z, v to u and v to x. Note that a triplet system TZ is called 
consistent with a phylogenetic network TV if every triplet in TZ is consistent with TV. 
For convenience, we will sometimes say that a phylogenetic network TV is consistent 
with a triplet t (or a triplet system TZ) if t (or TZ) is consistent with TV. In case TZ 
is consistent with a phylogenetic network TV and TZ = 72. (TV) then we say that 72 
reflects TV. Alternatively, we will say that 72 is reflected by TV. For example, the 
triplet set 72 = {a\bc, c\ab} is reflected by the three simple level-1 networks SL\(T), 
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i G {1, 2, 3} on {a, b, c} depicted in Fig. 5 which appeared in slightly different form 
in [JNS06]. 

a b c a b c c b a 

SL\{T) SL\{T) SL\{T) 

Figure 5. The three non- isomorphic simple level- 1 networks on 
{a, b, c} that all reflect the triplet system TZ = {a\bc, c\ab}. 

Second, the triplet system 1Z(N) is always dense, where a triplet system TZ on X 
is called dense if for any three elements in a, b, c G X there exits a triplet t G TZ such 
that L(t) = {a,b,c}. Arguably unassumingly looking, the concept of a dense triplet 
set has proven vital for level-fc network reconstruction, k > 1, from triplet systems. 
More precisely, the only known polynomial time algorithms for constructing level- 
1 and levcl-2 networks N consistent with such triplet systems construct N, (if it 
exist) by essentially building it up recursively from simple level- 1 and simple level- 2 
networks [JS06, vIKK + 08]. If the assumption that TZ is dense is dropped however, 
then it is NP-hard to decide if there exists a level-fc network, k = 1,2, consistent 
with TZ [JS06, vIKK + 08]. For larger values of k, a polynomial time algorithm for 
constructing a level-A: network from a dense triple set was recently presented in 
[TH09]. 

The next result is rather technical 2 but plays a crucial role in the proof of our 
main result (Theorem 1) as it shows that although all three simple level- 1 networks 
depicted in Fig. 5 reflect the same triplet set this property is lost when adding an 
additional leaf to a non-cut-arc of each of them. For a directed graph G these arcs 
are the elements in A(G) whose removal disconnect G. To establish our result, we 
require some more definitions and notations. 

Suppose TV is a phylogenetic network and a, b G V(N) such that a is below b. If 
c is a further vertex in V(N) and a b and c <n b holds then we call b a common 
ancestor of a and c. A lowest common ancestor lcaN{a,c) of a and c is a common 
ancestor of a and c and no other vertex below /ca,/v(a, c) is a common ancestor of 
a and c. Note that in a level-0 or level-1 network N, the lowest common ancestor 
between any two distinct leaves of N is always unique whereas this need not be the 
case for level- k networks with larger k. 

Now suppose N is one of the simple level-1 networks SL\(T), i G {1,2,3}, on 
X = {a, b, c} depicted in Fig. 5. Let e = w£ A(N) be a non-cut arc and suppose 
that d X. Then we denote by N e © d the level-1 network obtained from N by 
adding a new vertex w to V(N) and replacing e by the arcs uw, wv, and wd. We 
remark that if the knowledge of e is of no relevance to the presented argument, 
then we will write N © d rather than N e © d. 



A case analysis based alternative proof of this result may be found in [GBP08]. 
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Lemma 1. Suppose X = {a, b, c, d} and T = {a\bc, c\ab}. Then, for any two 
distinct i, j G {1, 2, 3}, 

K{SL\{T) ffi d) ^ K(SL{(T) 8 d). 

Proof: Put N k := SL k (T), k € {1,2,3}, and assume that there exist distinct 
i, j G {1, 2, 3} and non-cut-arcs e t G A(Ni) and e 3 G A(Nj) such that 7£(-/V*. ffid) = 
'R.{Nl ffid). By symmetry, it suffices to consider the cases (i, j) £ {(2, 1), (2, 3)}. For 
k G {1, 2, 3}, let Ufc, Vk G V(N k ) such that = u^Vk- Also for fc G {1, 2, 3}, let Wfc ^ 
V(iV fc ) denote the new vertex in V(N k k ffid) such that by replacing the arc by the 
arcs MfeWfe, WkVk, and adding the arc u^d the new network iV^ ffid is obtained from 
iV fc . Note that for all fc G {1, 2, 3}, both N k and 7V fc ffid have the same root and the 
same reticulation vertex which we denote by pk and rk, respectively. Furthermore, 
for all x, y G {a, b, c} we have Ica^k {x, y) = lcaj^k^ d {x, y) We distinguish the cases 
that U2 = P2 and that u 2 ^= p 2 . 

Suppose first that u 2 = p 2 and put I = lca^2(a,b). Then e 2 G {p2?"2> /?2^}- 
We first establish that j ^ 1. Assume for contradiction that j = 1. For all 
s,t G V(N^ 2 © d) such that t is below s denote a path from s to f in iVg 2 © d 
by P st . Observe that d\ac G TZ(N 2 2 © d), e2 G {P2T2, P2I}, holds. Indeed, since 
c\ab G 1Z(N 2 ) the paths P; a and P; c exist and do not have an internal vertex in 
common. Furthermore, W2 7^ I and either the arc W2I or the arcs P2I and P2W2 
exist. In both cases the paths P W2 i, P P2 i and P P2W2 , consisting of the arcs W2I, p 2 l 
and P2W2, respectively, do not have an internal vertex in common with either Pi a 
or Pi c . Thus, d\ac G TZ(N^ 2 © d), as required. By assumption, d\ac G TZ(N^ © d) 
follows which is impossible since Ica^i (a, c) = p\ and so d|ac = 1Z{Nl © d), for all 
non-cut arcs e G A{N 1 ). Thus, j 7^ 1, as required. 

If j = 3 then TZ(N^ 2 d) = n(N^ 3 © d) and u 2 = r 2 or ij 2 = Z. If v 2 = r 2 
then fe|cd G U(N^ © d) = 7\L(7V 3 3 © d) follows. But then w 3 cannot be a vertex 
on the path in N% © d from p$ to 6 or on the path from p% to r^ which avoids 
lca N a(a, c) Thus, it 3 = lca N 3(a, b) and so 6|da G TZ{N^ 3 © d) = TZ{N^ 2 © d) which is 
impossible as lcaj^i^ d {a, d) = p 2 and thus always above Z. If v 2 = I then d|6c, c\bd G 
1Z(N^ © d) = TZ(N^ 3 © d) follows which is again impossible since if W3 lies on the 
path from lca^a(a, c) to r^ then d|6c G" TZ(N^ 3 © d) and if not then U3 = ^3 and so 
c\bd £ TZ{Nl © d). Thus, j ^ 3. 

Now suppose that u 2 7^ p 2 . Then U2 G {Z, lca^2 (6, c)} Observe that arguments 
similar to the previous ones imply that a\cd, a\bd, c\ad, c\bd G Tl(N 2 2 ffid) holds for all 
u G {Z, lcapf2(b, c)}. If j = 1 then iti 7^ pi as otherwise a|6d or c|6d docs not belong 
to TZ{N^ © d) = 7£(iV e 2 2 © d). Thus vi = n and m G {lca N i(b, a), lca N i (b, c)}. If 
Mi = lca,Ni(b,a) then a|cd TZ(N^ ffid) which is impossible. Swapping the roles of 
a and c in the previous argument shows that u\ = lca N i(b,c) cannot hold either. 
Thus, j ^ 1. 

If j = 3 then again since a|cd, c\ad G TZ(N^ ffi d), it follows that e3 must be 
an arc on the path P from lea ^3 (a, c) to rs. Note that similar arguments as the 
ones used above imply that either d\bc or b\cd is contained in 1Z(N 2 2 ffi d). But 
b\cd G* ft(iV 3 3 ffi d) = 1Z{Nl 2 ffi d) and so d\bc G 7£(AT? 2 ffi d) must hold. But this is 
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impossible since then c\ad, d\bc 6 TZ(N^ 2 © d) but there exists no non-cut-arc e on 
P such that both triplets are simultaneously contained in lZ(N^(Bd). Thus, j / 3. I 



5. Encodings of Level-^ Networks 



In this section, we characterize those level- 1 networks N that are encoded by 
the triplet system 1Z(N), or equivalcntly the tree system T(N), or equivalcntly 
the cluster system S(N) they induce. In addition, we present an example that 
illustrates that our arguments cannot be extended to establish the corresponding 
result for level-2 networks and therefore to level- fc networks with k > 3. 

Bearing in mind that there exist triplet system which can be reflected by more 
than one level- 1 network, we denote the collection of all level-1 networks that reflect 
a triplet system 7Z by £,i(7Z). Clearly, if 7Z is reflected by a level- 1 network N then 
N G £i(7£(iV)) and so |£i(7£(iV))| > 1. Similarly, we denote for a tree system T 
the collection of all lcvcl-1 networks N for which T = T(N) holds by £i(T), and 
for a cluster system C the collection of all levcl-1 networks N for which C = S(N) 
holds by £t(C). As in the case of triplet systems, there exist tree systems T and 
cluster systems C with |£i(T)| > 1 and |£i(C)| > 1, respectively. 

Clearly, any cluster C C X induces a triplet system 71(C) of triplets on X defined 
by putting 

71(C) = {cic 2 \x : ci,c 2 € C and ieI-C}. 

Thus, any non-empty cluster system C on X induces a triplet system 71(C) defined 
by putting 71(C) := Ucec^(^)- ^he nex t result establishes a link between the 
triplet system induced by a level-1 network N and the triplet system TZ(S(N)). 

Lemma 2. Suppose N is a level-l network with at least 3 leaves. Then 

K(N)= |J H(T)= |J 71(C). 

TeT(N) CeS(N) 



Proof: That UtgT(jv) Tt(T) = UceS(JV) ^(^0 holds is trivial. Also it is straight 
forward to see that UreT(v) Tl(T) Q 7Z(N). To sec the converse set inclusion, 
suppose that t 6 7Z(N). Let xi,X2,X3 £ X such that t = xiX2\x3. Then with 
lca(x\,X2) := lcapf(xi, X2) we have X3 ^ CN(lca]y(xi, x 2 )) and lca(x\,X2) does not 
equal the root pm of N. Let Pi denote a path from p^ to Xi, i = 1,2 and let T 
denote the phylogcnctic tree on X obtained from N by modifying all reticulation 
vertices v of N in the following way. If v £ V(P\) U V(P 2 ) then randomly delete 
one of the incoming arcs of v and suppress the resulting degree 2 vertex. If this 
results in the decrease of the outdegree of the root of N then identify ppj with 
is unique child. If v € V(Pi), i = 1,2, then delete that incoming arc of v that is 
not an arc of Pi and suppress the resulting degree 2 vertex. Clearly, T is displayed 

by N and so * £ UT G T(v) 7 ^( :r )- Thus > n ( N ) ^ UtgT(jv) n ( T ) must hold which 
implies the lemma. I 
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Note that as the example of the level-2 network depicted in Fig. 6 shows, the 
relationship between the triplet system of a level- 1 network A and the triplet system 
induced by the clusters in S(N) does not hold for level-2 networks. 




FIGURE 6. A level-2 phylogcnetic networks with C\C2\X\ £ 7Z(N), 
but {d,c 2 } &S{N). 

To prove the main result of this note (Theorem 1) which we will do next, we 
require some additional definitions and notations. Suppose A is a phylogenetic 
network. Then we call a subset {x 7 y} C X a cherry of A if there exists a vertex 
v £ V(N) such that vx,vy £ A(N). Furthermore, if A is a lcvcl-1 network and 
x £ X then we denote by N — x the level- 1 network obtained from N by removing 
x (and its incident arc) and suppressing the resulting degree 2 vertex. In addition, 
we say that A is a strict level-1 network if N is not a phylogenetic tree. Finally, 
to a triplet system 1Z and some x £ Utg-R. we associate the triplet set 1Z X := 
{teK:x£ L(t)}. 

Armed with these definitions and notations we are now ready to establish our 
main result. 

Theorem 1. Suppose N is a level-1 network with at least 3 leaves. Then the 
following statements are equivalent 

(i) A contains a blob with jour vertices, 
(ii) |£i(^(A))| > 1. 
(Hi) \£i(S(N))\ > 1. 
(iv) |£i(T(AT))| > 1. 

Proof: (i) =>• (iv): This is an immediate consequence of the fact that all simple 
level-1 networks depicted in Fig. 5 induce the same set of phylogenetic trees. 

(iv) =j> (ii): Suppose that A is a level-1 network such that |£i(T(A))| > 1. Then 
there exists a level-1 network N' distinct from A such that T(A) = T(A'). Com- 
bined with Lemma 2, TZ(N) = \J T eT(N) n ( T ) = U Te r(W') n ( T ) = K( N ') follows 
and so A' £ £i(^(A)). Thus, \£,i{H(N))\ > 1, as required. 

(ii) =>■ (i) We will show by induction on the number n of leaves of A that if every 
blob in A contains at least 5 vertices then |£i(7£(A))| = 1. Suppose A is a level- 
1 network with n leaves such that every blob of A contains at least 5 vertices. 
Note that we may assume that A contains at least one blob since otherwise A is a 
phylogenetic tree and so |£i(7^(A))| = 1 clearly holds. But then n > 4. If n = 4 
then, using Lemma 1, it is straightforward to verify that \£i(1Z(N))\ = 1. 
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Suppose n > 4. Assume for every level-1 network No with no < n leaves that 
\£i(1Z(No))\ = 1 holds whenever Nq is a phylogenetic tree or every blob in Nq 
contains at least 5 vertices. Suppose for contradiction that £i(7\L(iV))| > 2. Choose 
some N' £ £i(ft(iV)) distinct from N. Then 1Z := TZ(N) = TZ(N'). We distinguish 
the cases that N contains a cherry and that it does not. 

Suppose first that N contains a cherry {x,y}. Without loss of generality, we 
may assume that this cherry is as far away from the root of N as possible. Then 
since N is a strict level-1 network all of whose blobs contain at least 5 vertices, 
N — x must enjoy the same property with regards to its blobs (if N — x still has 
blobs). But then, by induction hypothesis, \£i(1Z(N — x))\ = 1 and so N — x is the 
unique level- 1 network that reflects 1Z(N — x) = 1Z X . Since by the choice of x, for 
every leaf z in N distinct from x and y, only the triplet z\xy out of the 3 possible 
triplets on {x, y, z} is contained in 1Z — TZ(N'), it follows that {a;, y} must also be 
a cherry in N'. But then N — N' which is impossible. Thus, |£i(7?.(iV))| = 1 must 
hold in this case. 

Now suppose that N does not contain a cherry. Then there exists a blob B in 
N such that all cut-arcs that start with a vertex in B must end in a leaf of N . For 
each such leaf z, which we will also call a leaf of B, we denote by z' the vertex of B 
such that z'z is that cut-arc of N. Furthermore, denote by p the leaf of B such that 
p' is the reticulation vertex in B. Let y\ and y2 the vertices in V(N) — V(B) such 
that y[ and y' 2 are the two parent vertices of p' in B. Note that the root p = ps of 
B could be y[ or y' 2 but not both and that whenever y[ ^ p, i = 1,2, then j/j is a 
leaf of B (hence the abuse of notation) . Without loss of generality, we may assume 
that the path P py j from p to y[ in B is at least as long as the path P py > from p 
to y' 2 in B (where we allow paths of length zero). Thus, y\ must be a leaf of B. 
Since P py j is at least as long as P py > 2 and, by assumption on N, B contains at least 
5 vertices, there must exist a leaf y of B distinct from y\ such that y' 6 V(P py ' i ). 
Note that we may assume without loss of generality that y' is the predecessor of y[ 
on that path. We distinguish the cases that |V(i?)| > 5 and that |V(I?)| = 5. 

Suppose first |V(S)| > 5. Since a blob in the level-1 network N—y\ has clearly at 
least 5 vertices, we have |£i(7?.(iV — yi))| = 1 by the induction hypothesis. But then 
N — j/i is the unique level-1 network that reflects 1Z(N — yi)= Tl Vl - Consequently, 
since 1Z = TZ(N') we have N' — y\ = N — y\. To see that N equals TV' suppose z is a 
leaf of B distinct from yi,y,p (which must exist by assumption on B). Then either 
t := z\yp,p\yz £ TZ or t, y\zp £ 1Z holds. We only discuss the case that t,p\yz E 1Z 
since the case t, y\zp € 7Z is symmetric. Let B~ denote the blob in N — y\ obtained 
from B by deleting y\ plus its incident arc and suppressing the resulting degree 
2 vertex. Since z, y, and p are leaves of B~ and the choice of y\ implies that 
yiy\Pi y\yiP G TZ(N) = TZ(N'), it follows that there exists some blob B' in N' such 
that B~ = B' — j/i. Moreover, the suppressed degree 2 vertex of V(B') is adjacent 
(in B') with y' and p', respectively, since otherwise yi\yp € IZ(N') = 1Z(N) would 
hold which contradicts the choice of y\. Thus N = N' and so \£i(lZ(N))\ = 1 must 
hold in case |V(B)| > 5. 

We conclude with analyzing the case |V(.B)| — 5. Then either p = y 2 and so 
B has, in addition to the leaves y\,y,p, precisely one more leaf z, or p ^ y 2 and 
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the leaves of B are yi,y2,y and p. We first consider the case p ^ 2/2 • Consider the 
level-f network N — {yi,y'\\ obtained from N by removing y\, its parent vertex 
y[ and their 3 incident arcs (plus suppressing resulting degree 2 vertices) thus 
effectively turning B into a phylogcnetic tree on the leaves y,p,y2, i.e. the triplet 
t := y\pjj2- Put 1Z* := 1Z V1 U {t}. Since either N — {yi,y[} is a phylogcnetic tree 
or a strict level-f network such that each of its blobs contains at least 5 vertices, 
the induction hypothesis implies \£,i(1Z(N — {yi,y[}))\ = 1. Thus. N — {yi,y[} 
is the unique level- i network that reflects TZ*. Note that the only way to turn 
N—{yi,y[} into a level-f network that, in addition to reflecting 72,*, is also consistent 
with t' := y2\py G 7?. is to replace t by one of the level-1 networks SL\({t, t'}), 
j G Y" := {1,2, 3}. Denote that element in Y by j N . Since K(N) = K(N') it follows 
that the level-f network obtained from N' by removing y\, its parent vertex, and 
their 3 incident arcs (suppressing resulting degree 2 vertices) must equal N—{yi, y[} 
with t replaced by one of SL-[({t,t'}), j G Y. Denote that element in Y by jn 1 - 

Since {yi\py2,y2\pyi,y2\yiy,p\yiy,y\yip,y2\py,t} Q H = U(N') it is easy to check 

that j'jv = jff' must hold and so N and N' must be equal which is again impossible. 
Thus, \£i(1Z(N))\ = 1 must hold in case p ^ y2- Using arguments similar to the 
previous ones it is straight-forward N = N' and thus |£i(72.(iV))| = f must hold in 
case p = y 2 - 

(iv) => (iii): Suppose that TV is a level-f network with £x(T(iV))| > i. Then there 
exists a level-f network N' £ £i(T(AT)) distinct from N with T(N) = T(N'). But 

then S(N) = U TenN )C( T ) = U T eT(N') C ( T ) = S ( N ') and so N ' e ^(N)). 
Thus, £i(5(JV))| > i. 

(iii) => (ii): Suppose that iV is a level-1 network with |£i(5(A r ))| > 1. Then there 
exists a level-1 network N' G £i(5(AT))| > 1 distinct from N such that S(N) = 
S(N'). But then Lemma 2 implies 1Z(N) = UcesiN)^ ) = U C es(N>) n ( C ) = 
K(N') and so N' G £1 (%(#)). Hence, £i(^(JV))| > 1. I 

It should be noted that Theorem 1 immediately implies 

Corollary 1. Let N be a level-l network with at least 3 leaves. The number of 
non-isomorphic level-1 networks N' that reflect 1Z(N) (or equivalently for which 
T(iV) = T(N') or equivalently S(N) — S(N') holds) is 3 b , where b is the number 
of blobs of N of size four. 

We remark that the strategy underlying the proof of Theorem 1 does not imme- 
diately extend to level-fc networks with k > 2. The main reasons for this are that, 
as already mentioned above, for k > 2 the number of distinct level-fc generators 
grows exponentially in k [GBP09]. Also the problem of understanding when two 
distinct simple level-2 networks reflect the same set of triplets is far less well under- 
stood. For example, consider the two level-2 networks depicted in Figure 7. Each 
one of them is a simple level-2 network obtained by hanging leaves of the sides of 
the level-2 generators Q\ and Q\ depicted in Figure 3. As can be quickly verified, 
both networks reflect the same triplet set. However adding additional leaves to 
both networks by subdividing the arc one of whose end vertices forms an arc with 
x\ and the other forms an arc with X2 and attaching additional leaves results in 
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two distinct level-2 networks that still reflect the same triplet system. Regarding 
the accurate reconstruction of level-fe networks from e.g. triplet data, this results 
highlights a serious limitation of level-2 networks (and probably level-fc networks in 
general) as two such network with very different structure might reflect the same 
triplet set. 




Figure 7. Both simple level-2 networks reflect the triplet set 
{a\x\b, b\x\a, x\\ab, a\x2b, b\x2d, x 2 \ab, X\\X2<1, a\xiX2, x±\x2b, 
b\xix 2 }. 

We conclude with remarking that phylogcnctic trees on X can also be viewed 
as trees together with a bijective labelling map between X and the leaf set of such 
trees. Taking this point of view, phylogenetic trees were generalized in [MH06] to 
MUL-trees by allowing two or more leaves of that tree to have the same label. For 
example, the tree obtained from the phylogenetic tree depicted in Figure 1(c) by 
replacing the leaf labelled a by the cherry labelled {a, b} is such a tree. In fact, 
this is the MUL-tree induced by the level- 1 network N depicted in Figure 1(a) that 
shows all paths from the root of N to all leaves of N. For a level-1 network N it is 
easily seen that the MUL-tree M (N) induced by N this way is in fact an encoding 
of N in the sense that N is the unique level-1 network that can give rise to M(N). 
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